Acronym Disambiguation: A Domain Independent Approach

نویسندگان

  • Aditya Thakker
  • Suhail Barot
  • Sudhir Bagul
چکیده

Acronyms are present in usually all documents to express information that is repetitive and well known. But acronyms can be ambiguous because there can be many expansions of the same acronym. In this paper, we propose a general system for acronym expansion that can work on any acronym given some context information it is used in. We present methods for retrieving all the possible expansions of an acronym from Wikipedia and AcronymsFinder.com. We propose to use these expansions to collect the context in which these acronym expansions are used and then score them using a deep learning technique called Doc2Vec. All these things collectively lead to achieving an accuracy of 90.9% in selecting the correct expansion for given acronym on a dataset we scraped from Wikipedia with 707 distinct acronyms and 14,876 disambiguations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Processus global d'acquisition et de gestion des sigles

This paper deals with an acronym/definition extraction approach from textual data (corpora) and the disambiguation of these definitions (or expansions). Both steps of our global process of acquisition and management of acronyms are precisely described. The first step consists in using markers such as brackets to identify expansion candidates. The alignment of the letters allows to select the ac...

متن کامل

Automated Identification of Synonyms in Biomedical Acronym Sense Inventories

Acronyms are increasingly prevalent in biomedical text, and the task of acronym disambiguation is fundamentally important for biomedical natural language processing systems. Several groups have generated sense inventories of acronym long form expansions from the biomedical literature. Long form sense inventories, however, may contain conceptually redundant expansions that negatively affect thei...

متن کامل

Managing the Acronym/Expansion Identification Process for Text-Mining Applications

This paper deals with an acronym/definition extraction approach from textual data (corpora) and the disambiguation of these definitions (or expansions). Both steps of our global process of acquisition and management of acronyms are precisely described. The first step consists in using markers such as brackets to identify expansion candidates. The alignment of the letters allows to select the ac...

متن کامل

Semi-Supervised Maximum Entropy Based Approach to Acronym and Abbreviation Normalization in Medical Texts

Text normalization is an important aspect of successful information retrieval from medical documents such as clinical notes, radiology reports and discharge summaries. In the medical domain, a significant part of the general problem of text normalization is abbreviation and acronym disambiguation. Numerous abbreviations are used routinely throughout such texts and knowing their meaning is criti...

متن کامل

Extraction and Disambiguation of Acronym Meaning-Pairs in Medline

Acronyms are widely used in biomedical and other technical texts. Understanding their meaning constitutes an important problem in the automatic extraction and mining of information from text. Moreover, an even harder problem is sense disambiguation of acronyms; that is, where a single acronym, termed a polynym, has a multiplicity of meanings, a common occurrence in the biomedical literature. In...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017